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Rejection grammar processing method in utterance speech recognition 

system 

Patent Assignee: GTE INTERNETWORKING INC (SYLV ) 
Inventor: SHU C 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6016470 A 20000118 US 97969031 A 19971112 200012 B 

Priority Applications {No Type Date) : US 97969031 A 19971112 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 6016470 A 14 G10L-005/06 

Abstract (Basic) : US 6016470 A 

NOVELTY - A digitized sub-list of phoneme models is selected from a 
list of phoneme models and a digitized sequential representation is 
presented. A set of probabilities indicating how well the sequential 
representation matches sequential combination of sublist is calculated. 

DETAILED DESCRIPTION - The digitized sequential representation of 
utterance is compared with sequence of generated digitized list of 
words using an accepted main grammar process and a second set of 
probabilities is calculated. The highest probabilities for each of the 
two sets is determined. If the highest probability is found in first 
set of probabilities, the utterance is rejected and if found in the 
second set, the utterance is accepted. The selected digitized list is 
generated by compiling a complete list of all phoneme models in a 
language and forming a test set of digitized sequential representation 
of utterance with acceptable and rejectable parts. The results of test 
set are analyzed and a false rejection list, false acceptance list and 
their statistics are accumulated. If the statistics of the process is 
acceptable, the list of models are used and if not acceptable, the 
models are further processed to become acceptable. INDEPENDENT CLAIMS 
are also included for the following: 

(a) Computer system for speech recognition ; 

(b) speech recognition program product 

USE - In utterance speech recognition systems. 

ADVANTAGE - The main grammars are with large vocabularies of about 
thirty phonemes. The system requires only small memories thereby 
reducing cost and speech recognition is accurate and faster. 

DESCRIPTION OF DRAWING (S) - The figure shows the flowchart of 
method of selecting phonemes. 

pp; 14 DwgNo 5/7 

Title Terms: REJECT; GRAMMAR; PROCESS; METHOD; SPEECH; RECOGNISE; SYSTEM 
Derwent Class: P86; T01; W04 

International Patent Class (Main) : G10L-005/06 
File Segment: EPI; EngPI 

Rejection grammar processing method in utterance speech recognition 
system 
Inventor: SHU C 



Abstract (Basic) : 



a) Computer system for speech recognition ; 
(.-■ 

b) speech recognition program product... 
In utterance speech recognition systems... 

vocabularies of about thirty phonemes. The system requires only small 
memories thereby reducing cost and speech recognition is accurate 
and faster 
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Speech signal encoding technique uses analysis of fundamental frequency 

and harmonics to provide quality reproduction 
Patent Assignee: MATRA NORTEL COMMUNICATIONS (NELE ); MATRA NORTEL 

COMMUNICATIONS SAS (NELE ) 
Inventor: CAPMAN F; MURGIA C 

Number of Countries: 095 Number of Patents: 004 
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Patent No Kind Lan Pg Main IPC 
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Based on patent WO 200103120 

Based on patent WO 200103120 
AT BE CH CY DE DK ES FI FR GB GR IE IT 



Abstract (Basic) : 

audio encoding technique, the encoder estimates the fundamental 
frequency (F0) of an audio signal, and determines a spectrum of the 
audio signal by a transform of a frame of the audio signal in the 
frequency domain. Data representing the spectral amplitudes associated 
with. . . 

...module in the neighborhood of this frequency multiple. Data is obtained 
by use of the cepstral coefficients , calculated by transforming in 
the cepstral domain a compressed higher envelope of the spectrum of... 
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Audio encoding process for speech transmission includes use of cepstral 
coefficients and interpolation in decoding 
Patent Assignee: MATRA NORTEL COMMUNICATIONS (NELE ); MATRA NORTEL 

COMMUNICATIONS SAS (NELE ) 
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Priority Applications (No Type Date): FR 998635 A 19990705 
Patent Details : 

Patent No Kind Lan Pg Main IPC Filing Notes 

WO 200103118 Al F 52 G10L-019/02 

Designated States (National) : AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA 
CH CN CR CU CZ DE DK DM DZ EE ES FI GB GD GE GH GM HR HU ID IL IN IS JP 
KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NO NZ PL PT 
RO RU SD SE SG SI SK SL TJ TM TR TT TZ UA UG US UZ VN YU ZA ZW 
Designated States (Regional) : AT BE CH CY DE DK EA ES FI FR GB GH GM GR 
IE IT KE LS LU MC MW MZ NL OA PT SD SE SL SZ TZ UG ZW 

AU 200062920 A G10L-019/02 Based on patent WO 200103118 

FR 2796191 Al G10L-019/02 

EP 1192619 Al F G10L-019/02 Based on patent WO 200103118 

Designated States (Regional) : AL AT BE CH CY DE DK ES FI FR GB GR IE IT 
LI LT LU LV MC MK NL RO SI 

Audio encoding process for speech transmission includes use of cepstral 
coefficients and interpolation in decoding 

Abstract (Basic) : 

The method includes use of a decoder to synthesise a set of 
successive frames of N sample of an audio signal from encoding data 
which is includes a in a digital flow received from the encoder. For 
only one subset of frames this includes data representing spectral 
amplitudes of the audio signal. For each of the frames of the subset 
the decoder determines the cepstral coefficients (cxq(n)) 
representing at least some of the spectral amplitudes. For frames not 
forming part of the subset, it interpolates the cepstral 
coefficients and generates a spectral estimate of the audio signal 
which it transforms in the temporal domain to obtain the synthesised 
frame . 
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(c) 2003 Thomson Derwent . All rts. reserv. 

011526848 **Image available** 

WPI Acc No: 1997-503334/199746 

XRPX Acc No: N97-419499 

Feature extractor for automated speech system - calculate logarithm of 
input frame spectrum and cepstrum of this logarithm, also detecting 
cepstral coefficient meeting predetermined criterion, and derives 
feature of detected cepstral coefficient representing voiced speech 
frame 

Patent Assignee: BRITISH TELECOM PLC (BRTE ) 
Inventor: POWER K J; RINGLAND SPA 
Number of Countries: 027 Number of Patents: 002 
Patent Family: 



Patent No Kind Date Applicat No Kind Date Week 

WO 9737345 Al 19971009 WO 97GB816 A 19970324 199746 B 

AU 9721669 A 19971022 AU 9721669 A 19970324 199808 



Priority Applications (No Type Date) : EP 96302235 A 19960329 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 

WO 9737345 Al E 24 G10L-009/00 

Designated States (National): AU CA CN JP KR MX NO NZ SG US 

Designated States (Regional): AT BE CH DE DK ES FI FR GB GR IE IT LU MC 

NL PT SE 

AU 9721669 A G10L-009/00 Based on patent WO 9737345 

calculate logarithm of input frame spectrum and cepstrum of this 
logarithm, also detecting cepstral coefficient meeting predetermined 
criterion, and derives feature of detected cepstral coefficient 
representing voiced speech frame 

. . .Abstract (Basic) : The feature extractor receives an input digital signal 
which is divided into frames , and calculates the logarithm of the 
spectrum of an input frame . A cepstrum calculator (334) calculates 
the cepstrum of the logarithm of the spectrum of the frame . 

. . .A pitch detector (335) detects a cepstral coefficient meeting a 
predetermined criterion. A feature deriver (336) derives a feature 
relating to the detected cepstral coefficient , which represents 
whether the input frame includes voiced speech. The spectrum 
calculator evaluates the logarithm of the power spectrum of the input 
signal . . . 

...speech processing. For extracting features from input signal for use by 
subsequent automated speech systems. Cepstral coefficients lying 
inside normal speech frequency range, including first twenty cepstral 
coefficients , may be discarded 

...Title Terms:, FRAME ; 
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Adaptive method for speaker identification and verification - converting 
audio input to frames that have adaptive weighing component applied for 
normalisation prior to recognition processing 
Patent Assignee: UNIV RUTGERS STATE NEW JERSEY (RUTF ) 
Inventor: ASSALEH K T; MAMMON E R J 
Number of Countries: 060 Number of Patents: 009 
Patent Family: 
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Priority Applications (No Type Date) : US 94203988 A 19940228 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 

WO 9523408 Al E 34 G10L-005/06 

Designated States (National) : AM AT AU BB BG BR BY CA CH CN CZ DE DK ES 
FI GB GE HU JP KE KG KP KR KZ LK LT LU LV MD MG MN MW MX NL NO NZ PL PT 
RO RU SD SE SG SI SK TJ TT UA UG UZ VN 

Designated States (Regional): AT BE CH DE DK ES FR GB GR IE IT LU MC NL 
OA PT SE 

AU 9521164 A G10L-005/06 Based on patent WO 9523408 

US 5522012 A 13 G10L-005/06 

EP 748500 Al E 34 G10L-005/06 Based on patent WO' 9523408 

Designated States (Regional) : AT BE CH DE DK ES FR GB GR IE IT LI LU MC 
NL PT SE 

AU 683370 B G10L-005/06 Previous Publ . patent AU 9521164 

Based on patent WO 9523408 
JP 10500781 W 36 G10L-003/00 Based on patent WO 9523408 

MX 9603686 Al G10L-005/06 
CN 1142274 A G10L-005/06 
MX 194244 B G10L-005/006 

converting audio input to frames that have adaptive weighing 
component applied for normalisation prior to recognition processing 

...Abstract (Basic): over a channel such as a telephone line. The input is 
digitised and converted to frames that are analysed by linear 
prediction. This extracts prediction coefficients... 

...Prediction coefficients are derived from the normalised speech to allow 
for identification. The pattern is compared to a number of speech 
patterns produced by a number of persons in advance... 

...Abstract (Equivalent): windowing a speech segment into a plurality of 
speech frames ; 



. . . determining linear prediction coefficients from a linear predictive 
polynomial for each said frame of speech... 

. . . determining a first cepstral coefficient from said linear 

prediction coefficients in which first cepstrum information comprises 
said first cepstral coefficient ; 



. . . determining a plurality of roots of said linear prediction polynomial 
from the poles of said all... 

...selecting one of said frames having a predetermined number of said 
roots within a unit circle of the z-plane in which said selected 
frames form said predetermined components of said first cepstrum 
information. . . 

...an adaptive component weighting cepstrum to attenuate broad bandwidth 
components in said speech signal, by determining a finite impulse 
response filter for emphasizing the speech formants of said speech 
signal and attenuating said residue components comprising the steps of 
determining a finite impulse response filter for emphasizing the 
speech formants of said speech signal and attenuating said residue 



components, determining adaptive component weighting coefficients 
from said finite impulse response filter, determining a second 
cepstral coefficient from said adaptive component weighting 
coefficients, and subtracting said second cepstral coefficient from 
said first cepstral coefficient for forming said adaptive component 
weighting cepstrum; and 
Title Terms: FRAME ; 
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Mobile telephone speech transmission appts. with speech recognition 

control - uses signal processing circuit to convert GSM coding parameters 

into speech recognition parameters 
Patent Assignee: PHILIPS PATENTVERWALTUNG GMBH (PHIG ); PHILIPS 

GLOEILAMPENFAB NV (PHIG ) 
Inventor: HIRSCH H; RUEHL H 

Number of Countries: 004 Number of Patents: 004 
Patent Family: 
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Priority Applications (No Type Date): DE 4126882 A 19910814 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
EP 527535 A2 G 12 G10L-005/06 

Designated States (Regional): DE FR GB 
DE 4126882 Al G10L-005/06 
JP 5241590 A G10L-003/00 
EP 527535 A3 G10L-005/06 

...Abstract (Basic): mobile telephone includes a speech coder (3) for 
encoding digital speech signals within a time frame using coding 
parameters. A signal processing circuit (6) receives part of the set of 
coding. . . 

...The evaluating circuit computes logarithmic area ratio coefficients 
into Cepstral coefficients from the coding parameters . The 
evaluating circuit determines speech parameters by comparing sets 
of coding parameters containing long term prediction gain factors with 
a selected threshold value... 
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Rejection method for speech recognition - deriving 

parameters derived from sequence frames with best choice matches to 



references used for unknow 

Patent Assignee: LENNIG M (LENN-I); NORTHERN TELECOM LTD (NELE ) 
Inventor: LENNIG M 

Number of Countries: 002 Number of Patents: 003 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

CA 2013263 A 19910928 CA 2013263 A 19900328 199150 

US 5097509 A 19920317 US 90501993 A 19900328 199214 

CA 2013263 C 19950905 CA 2013263 A 19900328 199542 
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Priority Applications (No Type Date): CA 2013263 A 19900328; US 90501993 A 

19900328 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 

US 5097509 A 11 

CA 2013263 C G10L-005/06 

deriving parameters derived from 

sequence frames with best choice matches to references used for unknow 

. . .Abstract (Basic) : method for speech recognition comprises representing 
an unknown utterance as a first sequence of parameter frames . Each 
parameter frame includes a set of primary and secondary parameters 
and an equalised second sequence of parameter frames derived from the 
first sequence of parameters frames . Each of the primary and 
secondary parameters in the sequence of parameter frames of the 
representation of the unknown utterance are compared to each of a 
number of reference representations expressed in the same kind of 
parameters, to determine how closely each reference representation 
resembles the representation of the unknown utterance... 

...Abstract (Equivalent): differences between pairs of primary cepstra. The 
equalised representation being the signed difference of each cepstral 
coefficient less an average value of the coefficients... 

...Factors are generated from the ordered lists of templates to determine 
the probability of the top choice being a correct acceptance, with 
different methods, being a... 

...Title Terms: FRAME ; 
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Speech recognition system using linear predictive coding - compares 
received speech frames with reference templates, generates error 
valves, and selects words with small error valves 

Patent Assignee: TEXAS INSTR INC (TEXI ) 

Inventor: ANDERSON W; DODDINGTON G R; MCMAHAN M L; RAJASEKARAN P K; 

RAJAS E KARA P K 
Number of Countries: 006 Number of Patents: 005 
Patent Family: 
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Priority Applications (No Type Date) : US 8779563 A 19870730 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
EP 302663 A E 9 

Designated States (Regional): DE FR GB IT 
US 4910784 A 9 
EP 302663 Bl E 13 G10L-005/06 

Designated States (Regional): DE FR GB IT 
DE 3884880 G G10L-005/06 Based on patent EP 302663 

KR 123934 Bl G10L-005/06 

compares received speech frames with reference templates, 
generates error valves, and selects words with small error valves 

...Abstract (Basic): a digital representation. A feature extractor coupled 
to the digitizer groups the digital signals into frames and generates 
a transform of the signal of each frame . The transform has a number 
of feature coefficients, and each feature coefficient has a 
corresponding. . . 

...than a preselected threshold for that coefficient. A queue coupled to 
the feature extractor receives frames of binary feature coefficients 
and arranges them in consecutive order. . . 

. . .A comparator coupled to the queue compares several speech frames 
with a number of reference templates having frames of binary feature 
coefficients and generates a number of error values indicating the 
closeness of the match between them. A decision controller coupled to 
the comparator receives the results of the comparisons , and selects 
a best match between a portion of a speech utterance and the reference 

...Abstract (Equivalent): 18) coupled to said A/D converter (16) for 

grouping the speech samples into speech frames and generating LPC 
parameters for each said speech frame , transforming said LPC 
parameters into cepstral parameters and deriving a frame of binary 
feature coefficients by coding said cepstral parameters into binary 
values each indicating a value greater or less than a preselected 
threshold and grouping the said binary values into said frames of 
binary features coefficients, a push-down queue (40) coupled to said 
feature extractor for receiving successive frames of binary feature 
coefficients corresponding to the successive speech frames ; a 
comparator (20) coupled to said queue (40) for comparing a plurality 
of the last received said frames of binary feature coefficients with 
a plurality of reference templates (22) each consisting of a plurality 
of frames of binary coefficients and generating a plurality of error 
values indicating the closeness of the match therebetween wherein only 
alternate frames in said queue (40) are used by said comparator 
(20) for the comparison with the templates (22), the number of said 
alternate frames being a function of the template length, and a 
decision controller (24) coupled to said comparator for receiving the 
results of the comparisons , and for selecting a best match between a 
portion of the speech signal and the. . . 

...Abstract (Equivalent): speech signal. A feature extractor coupled to the 
digitiser groups the digital speech signals into frames and generates 
a transform of the digital speech signals as grouped in each frame . 
The transform has a number of feature coefficients, and each feature 
coefft. has a corresp. . . 



.A queue is coupled to the feature extractor to receive frames of 



binary feature coefficients as speech frames and arranged them in 
consecutive order. A comparator is coupled to the queue to compare 
a number of speech frames with a number of reference templates that 
have frames of binary feature coefficients and generates a number of 
error values indicating the closeness of... 

The reference templates are representative of different words. A 
decision controller is coupled to the comparator to receive the 
results of the comparisons , and to select a best match between a 
portion of a speech utterance as represented by the speech frames and 
the reference. templates. . . 

USE/ADVANTAGE - With main frame , mini- or micro computer. Flexible and 

accurate vocabulary enrolment. 
Title Terms: COMPARE ; 
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...Abstract (Equivalent): calling a remote modem from a local modem over a 
telephone line connection, line a portion of which includes a 
cellular telephone link connection compression section having means 
for... 

...e.) compressing the outgoing digital voice data into compressed outgoing 
digital voice data frames , 



. ..h.) decompressing the compressed incoming digital voice data frames 
into the incoming digital voice data... 

...2.) the memory connected to the voice compression section and to a 
data transmission section ; and... 

...3.) the data transmission section having means for... 

...a.) receiving the compressed outgoing digital voice data frames from 
the memory. . . 

. ..c.) placing the compressed outgoing digital voice data frames into 

compressed outgoing digital voice data packets... determining a power 
of a at least a portion of the compressed outgoing digital voice data 
packet including a plurality of samples of the local voice signals as 
a function of the summation of the square of each sample over the 
portion of the compressed outgoing digital voice data packet; and. . . 

. . . comparing the power of the portion of the compressed outgoing 
digital voice data packets to a preselected threshold to indicate 
whether ... voice signals into discrete samples of digital voice data and 
collecting the discrete samples into segments ; 



means for dividing the segments into subsegments and for producing 
therefrom a current voice subsegment . . . 

pitch prediction means for determining the long term predicted gain of 
the current voice subsegment by comparing the current voice 
subsegment to reconstructed voice samples to produce a pitch predictor 
gain and. . .means for determining the peak amplitude of the long 
term residual samples... 

means for scaling the long term residual samples based on the peak 
amplitude to produce normalized long term residual samples... 

means including a code book stored in a memory for comparing the 
normalized long term residual samples to stored distinct normalized 
long term residual samples stored. . . 

for providing the distinct memory address, the pitch predictor gain, the 
lag component and the peak amplitude for each voice subsegment... 
creating a qualified packet having a qualified packet identifier and a 
plurality of command identifiers for communicating control information 
...if in voice over data mode of operation, the acoustic echo 
cancellation means including a first Finite Impulse Response filter 
and operable in response to the selected line echo cancelled incoming 
digital voice data or the selected decompressed incoming digital voice 
data for removing acoustic echo from the outgoing digital voice data 
and for producing therefrom acoustic echo cancelled outgoing data, the 
line echo cancellation means including a second Finite Impulse 
Response filter and operable in response to the acoustic echo cancelled 
outgoing digital... 
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using vocoder to compare one or more correlation peaks with clipping 
threshold value, so that additional calculations may be performed if 
single peak is greater than clipping threshold 

.Abstract (Basic) : The method involves performing a correlation 

calculation on a first frame of a speech waveform. The correlation 
calculation for the first frame produces one or more correlation 
peaks at respective numbers of the delay samples. A single correlation 

peak is determined from the one or more correlation peaks . The 
single peak has a peak location (Pd) comprising a first number of 
delay samples . . . 

.The method further involves searching for a peak location (Pd 1 ) where 
the single peak location Pd of the signal correlation peak is a 
multiple of the peak location Pd' . The peak location Pd' has a 
correlation peak . The peak location Pd 1 comprises a second number 
of delay samples. Finally the pitch is set equal to the second number 
of delay samples indicated by the peak location Pd. . . 



.ADVANTAGE - More accurate estimate of pitch of received waveform. 



Disregards second and higher multiples of true pitch, more accurately 
...Title Terms: PEAK ; 
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Processing speech signal for voice synthesis system - deepening concave 
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. . . deepening concave part between spectral formants and emphasising 

peaks 

. . .Abstract (Basic) : The speech signal processing method consists of 

several steps. First , the signal (SI) is subjected to a high pass 
formant emphasis process. Next, a second emphasis process is applied 
to the speech signal to extend the entire region, more specifically. . . 

...the valley of the frequency. Subsequently, a third emphasis process is 
applied to emphasize the peak magnitude of a formant in the voice 
frame in the upright part of the speech signal (S3) . Finally, a fourth 
emphasis process is... 

...Title Terms: PEAK 
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Excitation signal synthesis for 

for speech coding system. . . 
...synthesising linear prediction filter coeffts. during erased frames 

using weighting extrapolation to obtain bandwidth expansion of peaks 

filter response 



Previous Publ . patent JP 7311597 



Based on patent EP 673017 
frame erasure of packet data loss e.g. 



in 



. . .Abstract (Basic) : method for a signal imitating human speech involves 

storing samples of a signal from a first excitation signal generator. 
In response to a signal indicating bit erasure, a second excitation 
signal is synthesised based on previously stored samples of the first 
excitation signal. The second signal is filtered to synthesise a 
human type speech signal... 

...Abstract (Equivalent): use by a decoder which experiences an erasure of 
input bits, the decoder including a first excitation signal generator 
responsive to said input bits and a synthesis filter responsive to an 



...storing samples of a first excitation signal generated by said first 
excitation signal generator... 



.responsive to a signal indicating the erasure of input bits, 

synthesizing a second excitation signal based on previously stored 



f 



samples of the first excitation signal; and. . . 

...filtering said second excitation signal to synthesize said signal 
reflecting human speech. . . 

...wherein the step of synthesizing a second excitation signal comprises 
the steps of . . . 

...forming said second excitation signal based on said identified set of 

excitation signal samples... 
...Title Terms: FRAME ; 
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temporal masking properties of human auditory system and selecting 
block size by comparing differences between peak values in each time 
interval 

. . .Abstract (Basic) : process in which digital audio signals having spectral 
and temporal structure are decomposed into several frames , involves 
defining the audio signal into time intervals according to a temporal 
masking properties of the human auditory system, and obtaining a peak 
value in each of the time intervals... 

...Differences among the peak values are calculated, and a block size is 
selected based upon a comparison of the... 

...Abstract (Equivalent): A method of determining a block size for a frame 
which is a part of a digital audio signal having temporal structure, 
said block size being used for a transform coding process which 
composes digital audio signals into frequency spectral frames , 
comprising steps of . . . 

...said frame being composed of four continuous time intervals... 

...obtaining a peak value in each of said time intervals... 

...calculating a first difference between said peak value of one of 

said time intervals and said peak value of an adjacent time interval 



...calculating a second difference between said peak value of said one 
of said time intervals and said peak value of another of said time 
intervals . . . 

...selecting said block size for said frame , based on whether said first 

difference or said second difference exceeds a predefined value. . . 
...Title Terms: PEAK ; 
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...Abstract (Basic): 48 sub 1 -48 sub n) , rectifier (50) and low-pass 

filter (52) or a peak detector following more rapidly the possible 
abrupt increases in frequency-localised energy. . . 

...Abstract (Equivalent): channel vocoder including a means (36) of 

reception and extraction of the data in a frame , a means (38) of 
producing an excitation signal, the said synthesis subassembly 
including: n synthesis... 

...means (38) of producing the excitation signal, at least one subtracter 
(62) receiving on a first input the energy signal delivered by a 
synthesis channel and on a second input the signal coming from the 
means of reception (36) and representing the energy signal... 
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APPLICANT(s) : DAINIPPON PRINTING CO LTD 
APPL. NO.: 10-283454 [JP 98283454] 
FILED: September 18, 1998 (19980918) 

ABSTRACT 

... acoustic signal to be encoded is PCM (pulse code modulation) -encoded and 
taken in as acoustic data , and plural unit sections are set on a 
time base. Fourier conversion is performed for each unit section and a 
spectrum S is obtained. The prescribed threshold value L is set, a series 
of continuous part of which the intensity is this threshold value L or 
more out of the spectrum S is recognized as formant F1-F5 being peculiar in 
vocal respectively. The maximum peak frequency in each formant is 
extracted as a representative frequency representing the formant, and one 

... Each representative frequency is replaced by a note number of 
MIDI (musical instrument . digital interface) data , an acoustic signal in 
the unit section is encoded by this note number. 
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OCEANOGRAPHIC ACOUSTIC TOMOGRAPHIC DATA ANALYZER 

ABSTRACT 

PROBLEM TO BE SOLVED: To track the peak of received signal data by 
eliminating the influence of tidal phenomena by sampling received signal... 

...a receiving system signal data memory 1 at every observation from a data 
input-output section 11. The received signal data are divided into a 
plurality of data groups by sampling the data in a period which is close 
to the tidal periodicity and tracking is performed in accordance with the 
received signal data in each divided data group. Of the periodically 
observed received signal data, namely, a plurality of observed received 
signal data observed in a period which is close to the lunar... 
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Title: A model of dynamic auditory perception and its application to 
robust word recognition 

Author(s): Strope, B.; Alwan, A. 

Author Affiliation: Dept. of Electr. Eng., California Univ., Los Angeles, 
CA, USA 

Journal: IEEE Transactions on Speech and Audio Processing vol.5, no. 5 
p. 451-64 

Publisher: IEEE, 

Publication Date: Sept. 1997 Country of Publication: USA 

CODEN: IESPEJ ISSN: 1063-6676 

SICI: 1063-667 6(199709) 5: 5L. 451: MDAP;1-W 

Material Identity Number: P947-97005 

U.S. Copyright Clearance Center Code: 1063-6676/97/$10 . 00 
Language : English 
Subfile: B C 
Copyright 1997, I EE 

...Abstract: common automatic speech recognition (ASR) front end and 
provide adaptation and isolation of local spectral peaks . A dynamic model 
consisting of a linear filterbank with a novel additive logarithmic 
adaptation stage... 

... An extensive series of perceptual forward masking experiments, together 
with previously reported forward masking data, determine the model 1 s 
dynamic parameters. Once parameterized, the simple exponential dynamic 
mechanism predicts the nature of forward masking data from several 
studies across wide ranging frequencies, input levels, and probe delay 
times. An initial evaluation of the dynamic model together with a local 

peak isolation mechanism as a front end for dynamic time warp (DTW) and 
hidden Markov model (HMM) word recognition systems shows an improvement in 
robustness to background noise when . compared to Mel-frequency cepstral 

coefficients (MFCC) , linear prediction cepstral coefficients (LPCC) , 
and relative spectra (RASTA) based front ends. 

...Identifiers: local spectral peaks isolation... 

...local spectral peaks ; ... 

...Mel-frequency cepstral coefficients ; ... 
...linear prediction cepstral coefficients ; 
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Title: Towards feature-based speech metric 
Author(s): Bayya, A.; Hermansky, H. 

Author Affiliation: US West Adv. Technol . , Englewood, CO, USA 
Conference Title: ICASSP 90. 1990 International Conference on Acoustics, 
Speech and Signal Processing (Cat. No . 90CH2847-2) p. 781-4 vol.2 
Publisher: IEEE, New York, NY, USA 

Publication Date: 1990 Country of Publication: USA 5 vol. 2970 pp. 
U.S. Copyright Clearance Center Code: CH2847-2/90/0000-0781$01 . 00 



Conference Sponsor: IEEE 

Conference Date: 3-6 April 1990 Conference Location: Albuquerque, NM, 
USA 

Language: English 
Subfile: B 

Abstract: A speech metric which directly uses spectral features such as 
spectral peak frequencies and bandwidths is proposed and evaluated . The 
spectral features either are derived directly by solving the all-pole model 
polynomial to get spectral peak frequencies and bandwidths and fitting 
the linear regression line to the logarithmic spectrum of the model or are 
estimated as a linear combination of the several lower cepstral 

coefficients of the all-pole model spectrum. The performance of the 
studied metric in speaker-independent... 
Identifiers: peak bandwidths... 

...spectral peak frequencies... 
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Title: An approach to speaker adaptation based on analytic functions 

Author (s): McDonough, J.; Zavaliagkos, G.; Gish, H. 

Author Affiliation: BBN Syst . & Technol. Corp., Cambridge, MA, USA 

Conference Title: 1996 IEEE International Conference on Acoustics, 
Speech, and Signal Processing Conference Proceedings (Cat. No . 96CH35903) 
Part vol. 2 p. 721-4 vol. 2 

Publisher: IEEE, New York, NY, USA 

Publication Date: 1996 Country of Publication: USA 6 vol. lvii+3588 
pp. 

ISBN: 0 7803 3192 3 Material Identity Number: XX96-02717 

U.S. Copyright Clearance Center Code: 0 7803 3192 3/96/$5.00 
Conference Title: 1996 IEEE International Conference on Acoustics, 
Speech, and Signal Processing Conference Proceedings 
Conference Sponsor: Signal Process Soc . IEEE 

Conference Date: 7-10 May 1996 Conference Location: Atlanta, GA, USA 
Language: English 
Subfile: B C 
Copyright 1997, IEE 

...Abstract: formulate a novel approach to speaker adaptation. It is 
predicated upon the fact that the cepstral coefficients used as feature 
vectors in most state of the art speech recognition systems are 
coefficients . . . 

... an analytic function of a complex-valued argument. This analytic 
function can be characterized by several poles and zeros in the complex 
plane corresponding to spectral peaks and nulls, respectively. Speaker 
adaptation can be viewed as the estimation of a function that... 
. . .Identifiers: cepstral coefficients ; ... 

...spectral peaks ; 
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01797549 ORDER NO: AADAA-I993524 8 

THE RATIO SPECTRUM (DISCRETE TIME , POWER SPECTRUM REPRESENTATION) 

Author: LIM, SHAO-JEN 
Degree: PH.D. 
Year: 1999 

Corporate Source/Institution: UNIVERSITY OF FLORIDA (0070) 
Source: VOLUME 60/06-B OF DISSERTATION ABSTRACTS INTERNATIONAL. 
PAGE 2858. 131 PAGES 

...have developed a novel power spectrum representation called the 
<italic> ratio spectrum</italic> that provides many advantages over the 
standard power spectrum, particularly for analog implementations. The ratio 
spectrum is formed. . . 

...vector for speech recognition systems. Initial results show that the 
ratio spectrum outperforms LP-derived cepstral coefficients , peaks 
found through LP analysis and filter banks in phoneme recognition 
experiments. (4) The ratio... 
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Title: On the use of variable frame rate analysis in speech recognition 
Author (s): Qifeng Zhu; Alwan, A. 

Author Affiliation: Dept. of Electr. Eng., California Univ., Los Angeles, 
CA, USA 

Conference Title: 2000 IEEE International Conference on Acoustics, 
Speech, and Signal Processing. Proceedings (Cat. No . 00CH37100) Part 
vol.3 p. 1783-6 vol.3 

Publisher: IEEE, Piscataway, NJ, USA 

Publication Date: 2000 Country of Publication: USA 6 vol. lxxx+3906 
pp. 

ISBN: 0 7803 6293 4 Material Identity Number: XX-2000-0177 6 

U.S. Copyright Clearance Center Code: 0 7803 6293 4 /2000/$10 . 00 
Conference Title: Proceedings of 2000 International Conference on 
Acoustics, Speech and Signal Processing 

Conference Sponsor: IEEE; Signal Process. Soc 

Conference Date: 5-9 June 2000 Conference Location: Istanbul, Turkey 

Language: English 

Subfile: B C 

Copyright 2000, IEE 
Title: On the use of variable frame rate analysis in speech recognition 

...Abstract: discriminating and identifying speech sounds. These changes 
can occur over very short time intervals. Computing frames every 10 ms, 
as commonly done in recognition systems, is not sufficient to capture such 
dynamic changes. In this paper, we propose a variable frame rate (VFR) 
algorithm. The algorithm results in an increased number of frames for 
rapidly-changing segments with relatively high energy and less frames 
for steady-state segments . The current implementation used an average 
data rate which is less than 100 frames per second. For an isolated word 
recognition task, and using an HMM-based speech recognition. . . 

... accuracy especially at low signal-to-noise ratios. The technique was 
evaluated with mel frequency cepstral coefficient (MFCC) vectors and MFCC 
vectors with enhanced peak isolation. 

Descriptors: cepstral analysis... 

Identifiers: variable frame rate analysis... 

. . .rapidly-changing segments ; . . . 

. . . steady-state segments ; ... 

. . .mel frequency cepstral coefficient 
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Title: Feature extraction for three- segmental speech signals 
Author (s): Tung, S.-L.; Juang, Y.-T. 

Author Affiliation: Dept. of Electr. Eng., Nat. Central Univ., Chung-Li, 
Taiwan 

Conference Title: 1995 International Symposium on Communications Part 
vol.1 p. 287-94 vol.1 

Publisher: Nat. Taiwan Univ, Taipei, Taiwan 



Publication Date: 1995 Country of Publication: Taiwan 2 vol. 

xxii+1235 pp. 

Material Identity Number: XX95-01599 

Conference Title: Proceedings of 1995 International Symposium on 
Communications. ISCOM' 95 

Conference Sponsor: Ministr. Educ; Nat. Sci. Council; Ind. Technol . ; et 

al 

Conference Date: 27-29 Dec. 1995 Conference Location: Taipei, Taiwan 
Language: English 
Subfile: B C 
Copyright 1996, IEE 

Title: Feature extraction for three- segmental speech signals 

Abstract: Generally, the feature extraction of speech signals is to 
obtain the LPC cepstrum frame by frame , but it requires a lot of 
memories to represent each word and does not consider... 

... not less than the others. Firstly, we propose a simple way to find the 
pitch peaks , then according to the pitch peaks , we divide each word 
into three segments : consonant- segment , vowel- segment and residual- 
segment , finally we select one frame in each segment to compute the 
LPC-cepstrum. From our experiments, this method obtains a good result and 
shows that the variances between cepstral coefficients are smaller than 
the others. 

...Identifiers: three- segmental speech signals... 
. . .pitch peaks ; ... 
. . .consonant- segment ; ... 
. . .vowel- segment ; . . . 
...residual- segment 
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15090562 PASCAL No. : 01-0250265 

Efficient automatic recognition of spoken digit strings 

OSHAUGHNESSY Douglas; TOLBA Hesham 

INRS-Telecommunications, 900 de la Gauchetiere west, P.O. Box 644, 
Montreal, PQ H5A 1C6, Canada 

Journal: The Journal of the Acoustical Society of America, 2001-05-01, 
109 (5) p. 2316 

Language: English 

Copyright (c) 2001 American Institute of Physics. All rights reserved. 

. . . application, such recognition was investigated here under different 
conditions. Traditional hidden Markov model approaches with cepstral 
analysis were not used,- because they are computationally intensive and have 
not always worked well under adverse acoustic conditions. Simpler spectral 
analysis was used, combined with a segmental approach. The analysis 
focuses on locations of spectral peaks , similar to formant tracking, but 
without the need to estimate peaks for all time frames . The limited 
nature of the vocabulary (i.e., ten digits) allows this simpler approach. 
High. . . 
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Title: Segmentation of color lip images by spatial fuzzy clustering 
Author (s) : Liew, A.W.-C; Shu Hung Leung ; Wing Hong Lau 

Author Affiliation: Dept. of Comput. Eng. & Inf. Technol . , City Univ. of 
Hong Kong, China 

Journal: IEEE Transactions on Fuzzy Systems vol.11, no. 4 p. 542-9 
Publisher: IEEE, 

Publication Date: Aug. 2003 Country of Publication: USA 

CODEN: IEFSEV ISSN: 1063-6706 
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Material Identity Number: P984-2003-004 
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Author (s): Liew, A.W.-C; Shu Hung Leung ; Wing Hong Lau 
...Descriptors: speech recognition 

...Identifiers: automatic speech recognition system 
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Title: Lip contour extraction from color images using a deformable model 
Author (s): Liew, A.W.-C; Shu Hung Leung ; Wing Hong Lau 

Author Affiliation: Dept. of Comput. Eng. & Inf. Technol., City Univ. of 
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Journal: Pattern Recognition vol.35, no. 12 p. 2949-62 
Publisher: Elsevier, 

Publication Date: Dec. 2002 Country of Publication: UK 
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Subfile: B C 
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Author (s): Liew, A. W. -C . ; Shu Hung Leung ; Wing Hong Lau 

...Abstract: use of visual information from lip movements can improve the 
accuracy and robustness of a speech recognition system. In this paper, a 
region-based lip contour extraction algorithm based on deformable. . . 

...Descriptors: speech recognition 

...Identifiers: speech recognition system 
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Author: Shu, Chang-Qing 

Corporate Source: BBN Technologies, Cambridge, MA, USA 

Conference Title: Proceedings of the 1998 4th International Conference on 
Signal Processing Proceedings, ICSP ' 98 

Conference Location: Beijing, China Conference Date: 19981012-19981016 
E.I. Conference No.: 55222 

Source: International Conference on Signal Processing Proceedings, ICSP v 
1 1998. p 646-649 

Publication Year: 1998 
CODEN: 002534 
Language: English 

Title: Selected phoneme rejection grammar for a speech recognition 
system 

Author : Shu , Chang-Qing 

Descriptors: Speech recognition; Pattern recognition systems; 
Computational grammars 
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Title: Duration modeling in large vocabulary speech recognition 

Author: Anastasakos, Anastasios; Schwartz, Richard; Shu, Han 
Corporate Source: Northeastern Univ, Boston, MA, USA 

Conference Title: Proceedings of the 1995 International Conference on 
Acoustics, Speech, and Signal Processing. Part 1 (of 5) 

Conference Location: Detroit, MI, USA Conference Date: 

19950509-19950512 

E.I. Conference No.: 43559 

Source: Speech ICASSP, IEEE International Conference on Acoustics, Speech 
and Signal Processing - Proceedings v 1 1995. IEEE, Piscataway, NJ, 
USA, 95CH35732. p 628-631 

Publication Year: 1995 

CODEN: IPRODJ ISSN: 0736-7791 

Language: English 

Title: Duration modeling in large vocabulary speech recognition 

Author: Anastasakos, Anastasios; Schwartz, Richard; Shu, Han 

Abstract: This paper presents a study of different methods for phoneme 
duration modeling in large vocabulary speech recognition. We investigate 
the employment of phoneme duration and the effect of context', speaking rate 
and lexical stress in the duration of phoneme segments in a large 
vocabulary speech recognition system. The duration models are used in a 
postprocessing phase of BYBLOS, our baseline... 

Descriptors: Speech recognition; Character recognition; Markov 
processes; Mathematical models; Speech analysis; Algorithms; Probability 

Identifiers: Vocabulary speech recognition system; Hidden semi-Markov 
models; Parzen-window method; Duration modeling; Human listeners 
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Copyright (c) 2000 American Institute of Physics. All rights reserved. 

Rejection grammar using selected phonemes for speech recognition system 
SHU Chang-Qing 
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processing; Speech recognition equipment; Natural languages; Speech 
intelligibility 



25/3, K/6 (Item 2 from file: 144) 

DIALOG (R) File 14 4: Pascal 

(c) 2003 INIST/CNRS. All rts. reserv. 

13451871 PASCAL No. : 98-0147032 

Total least squares linear prediction for frequency estimation with 
frequency Weighting 

ICASSP 97 : international conference on acoustics, speech , and signal 
processing : Munich, April 21-24, 1997. Volume V: Statistical signal and 
array processing, applications 

SHU HUNG LEUNG ; TIN HO LEE; WING HONG LAU 

Department of Electronic Engineering, City University of Hong Kong, 83 
Tat Chee Avenue, Kowloon, Hong Kong 

IEEE, New York NY, United States. 

International conference on acoustics, speech, and signal processing ( 
Munich DEU) 1997-04-21 
1997 3993-3996 

Publisher: IEEE Computer Society Press, Washington DC 
Language: English 
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...SPECIFICATION also be advantageously used. The zeroth autocorrelation is 
the frame energy of a given frame. Cepstral coefficient generator 305 
converts each frame into cepstral coefficients (the coefficients 
of the Fourier transform representation ...using Durbin's method, which 
is known in the art. Tapered windower 306 weights the cepstral 
coefficients in order to minimize the effects of noise. Tapered windower 
306 is chosen to lower the sensitivity of the low-order cepstral 
coefficients to overall spectral slope and the high-order cepstral 
coefficients to noise (or other undesirable variability) . Temporal 
"differentiator 307 generates the first time derivative of the cepstral 

coefficients preferably employing an orthogonal polynomial fit to 
approximate (in this embodiment, a least-squares estimate... 

...over a finite-length window) to produce processed signal S f (n). In 

another embodiment, the second time derivative can also be generated by 
temporal differentiator 307 using approximation techniques known in... 
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...Signal Processing, page 501-504, March, 1992. The linear predictive (LP) 
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length less than the expected length by up to 4 frames is considered 
acceptable. When a shorter word is enrolled, the silence at the end is 
not included in the reference template , so that the template itself 
is shorter than was originally expected. If the enrolled word is longer 
than expected. . . 
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... ASR art for finding the most probable sequence of HMM states. 

FIG. 4 is a first part of flow chart of a process 400 for extracting 
feature ...applied to the log magnitude MEL scale frequency components 
for each frame to obtain a cepstral coefficient vector 



for each frame . In step 504 first or higher order differences are 
taken 

between corresponding cepstral coefficients for two or more frames 
to obtain at least first order inter frame cepstral coefficient 
differences (deltas) . In step 506 for each frame the cepstral 
coefficients and the inter frame cepstral coefficient differences 
are output as a feature vector. 

FIG. 6 is a hardware block diagram of... 
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As shown in Fig. 3, the front end processing module 200 produces a frame 



from digital samples according to a procedure 300. The module first 
produces a frequency domain representation X(f) of the portion of the 
utterance by performing From the normalized results, the module performs 
cepstral analysis to produce twelve cepstral parameters (step 325) . 
The module generates the cepstral parameters by performing an inverse 
cosine transformation on the logarithms of the frequency parameters. . 
Cepstral parameters and cepstral differences (described below) have 
been found to emphasize information important to speech recognition more 
effectively than 
6 

do the frequency parameters. After performing channel normalization of 
the cepstral parameters (step 330), the module produces twelve 
cepstral differences (that is, the differences between cepstral 
parameters in successive frames ) (step 335) and twelve cepstral 
second differences (that is, the differences between cepstral 
differences in successive frames ) (step 340). Finally, the module 
perfor m-s an IMELDA linear combination transformation to select the 
twenty- four most useful parameters from the twelve cepstral 
parameters , the twelve cepstral differences, and the twelve cepstral 
second differences (step 345) . 

Referring again to Fig. 2, a recognizer 215 receives and processes the... 
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English Abstract 

...voice activity or VAD method in a voice signal, particularly in 
telephonic applications, comprising: a first step aimed at acquiring 
the voice signal (1) divided in segments or frames having a time 
durationd, a second step aimed at computing, for each frame , at least 
three of the following five parameters: the energy differential over the 
whole band. . . 

...over .the band 0-lkHz, DeltaE n sub"l, the zero crossing rate differential, 
DeltaZCR, the second cepstral coefficient , c"sub"2, and the fifth 
cepstral coefficient , c"sub"5, a third step in which a neural network 
process is carried out in order to provide, based upon at least three of 
said five parameters, for each frame , an output value Y in the range 
defined by a minimum value Y"sub"min. . . 

Claim 

. . . over the band 0-1 kHz, AE, , 

the zero crossing rate differential, AZCR, 

the second cepstral coefficient , d2, and 

the fifth cepstral coefficient , C5, 
- a third step in which a neural network process is carried out in 
order to provide, based upon al least three of said five parameters, for 
each frame , an output value Y in the range defined by a minimum value 
Y, , (inverted exclamation. . . 
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Detailed Description 

... to the back end every time a new cepstrum is calculated i.e. every 
speech frame is processed to 

form a feature vector. Often, additional information concerning the time 
derivatives of . . . 

. . .MFCC is also provided. For example, a feature vector may also contain 
information about the first and second time-derivatives of each 

cepstral coefficient . A conventional method for incorporating 
temporal 

information into speech vectors is to apply linear regression to a series 
of 

successive cepstral coefficients to generate first and second 

difference 

cepstra, referred to as 'delta' and 'delta-delta 1 cepstra (as indicated 
in the dashed. . . 
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Detailed Description 

attention words (Step 402) . This allows the NLICS to accept 



non-prompted user requests, but first the system must be told that a 
user request is coming. The attention word accomplishes... 



.dimensional feature vector is derived from the acoustic data. These 
features consist of Mel-Frequency Cepstral coefficients 1-12 and the 
first and second order derivatives of MFC coefficients 0 Thus, feature 
vectors are created from the acoustic data. . . 
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Detailed Description 

... of speech recognition 

parameters other than cepstral coefficients. 

The fourteen parameters for each sampling time- frame are arranged, or 
formatted, into a corresponding vector, also known as an array, as shown 
in 

FIG. 1. Vector 131 corresponds to sampling time- f ramie 121, vector 132 
corresponds to sampling time- frame 122, vector 133 corresponds to 
sampling time- frame 123, and vector 134 corresponds to sampling time- 
frame 124 . Such a 

vector can generally be represented as 
F c(m) I 
Y(M. . . 

. . . 10g[E(m)@ 
5 

The speech recognition parameters are processed prior to transmission 
from a first location to a second location. In the embodiment 
described below this is carried out as follows. The parameters from... 
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Claim 

said channels in said logarithmic channel energy. 

14 The system of claim 12 wherein a first time cosine transform (536) 
converts said static features (534) into delta features (542), and 
wherein . . . 

...features (534) into delta-delta features (544). 
22 

. The system of claim 14 wherein said first time cosine transform (536) 
and said second time cosine transform (536) each perform a centered... 

. . .P)Cos@ 0;T 
at k=-M 2M+I 

•where Ct (p) is a pth cepstral coefficient at a time frame t, M is 
half of a window size used to estimate differential coefficients, and o 
. . .1 C, +k (P 
at k=-M 2M+I 

where Ct (p) is a pth cepstral coefficient at a time frame t, M is 
half of a window size used to estimate differential coefficients, and o 
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... a telephone speech recognition engine uses the 

commonly known LPC analysis method to derive 12 cepstral coefficients 
and log 1 5 energy, along with first and second order derivatives. A 
preferred embodiment of a microphone speech recognition engine uses the 
commonly known... 

...FFT method to accomplish the same purpose. The result for both engines 
for each speech frame is a vector of 12 cepstra, 12 delta cepstra, 12 
delta delta cepstra, 
delta log. . . 
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the steps of: 
4A) initializing to desired settings a memory unit 
having at least a first memory location (Ml) for onset 
condition storage of an immediately preceding speech frame 
(IPSF) and.. -CSF and for determining an onset condition of the 
CSF; 

4E) utilizing the at least first and second memory 
locations for storing the onset condition of the CSF, the LPC 
coefficients. . .of the CSF 

indicates an onset speech frame, setting the IPSF onset 
condition in the first memory location to ONSET; and 
412) where the onset condition of the CSF 

indicates a non-onset speech frame, setting the IPSF onset 
condition in the first memory location to NON-ONSET, 
4 J) and, where selected, wherein at. least one of 4J1... 

.ONSET; 

4J2) the log spectral distance is determined by 

determining a mean squared error of cepstral coefficients 

I 0 between the selected current frame and its immediately 

preceding frame , , the cepstral coefficients for a speech frame 

being determined iteratively from the LPC coefficients and 

prediction error energy for the CSF; 

4J3... 
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Detailed Description 

effect of telephone line distortion on the spectrum of 
the speech signal, the zero and first cepstral- coefficients , known 



as MFCCO and MFCCI may be given zero weights. Processing preferably 
also includes deriving an indication related to the rate of change 
of each coefficient ( first order difference) and its second order 
difference. An algorithm for this purpose (see Figure 4) begins by 
setting the number ...an operation 52 

and setting (operation 53) a variable k nominally representing a 
particular recent frame to -kmax where kmax is the number of 
previous and succeeding frames relative to the frame k to be used 
in forming each jth first order rate of difference dj . 



An operation 54 and a test 55 cause (2kmax + 1. 
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...SPECIFICATION vectors as created by the filter banks. 

The same is also valid for the local peak detection method known from 
ICASSP 84 proceedings, volume 1, pages 9.9.1 to 9.9.4, which is also 
not applicable to real- time speech recognition systems. 

In view of the drawbacks of the prior speech recognition techniques, it 
. . .means are respectively realizable with exclusive hardware without use 
of the processor. 

The above and other objects, features and advantages of the present 
invention will become more apparent from the following description when 
taken in conjunction with the accompanying drawings in which a 
preferred embodiment of the present invention is shown by way of 
illustrative example. 

Fig. 1 is a flowchart illustrating processing by a prior speech 
recognition method; 

Fig. 2 is a block diagram illustrating a speech recognition 
apparatus according to the present invention; 

Fig. 3 is a view illustrating a characteristic of a band pass 
filter of a signal processor 15... 
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Detailed Description 

... the fact that not all electric power costs the same to generate. Power 
generated during peak times is more expensive than "base-line" power. 
For demand side management, utility companies will ... themselves typically 
are referred to as users, in the context of the network. Blocks or 
frames of data are transmitted over a link along a path between nodes of 
the network. . .network. The three layers of the X.25 interface 
architecture are the physical level, the frame level and the packet 
level. Although data communication between DCEs of the network is 
routinely ... years . NINS became the focus of service providers in 1995 as 
they saw revenues for frame relay network services double for two years 
in a row. What began as a way to boost the popularity of frame relay 
services by offering to lease and manage routers has blossomed into a 
diverse set . . . 

...end of the continuum consists of NINS for cur-rent network services, 
including leased lines, frame relay, and X On the far end is outsourced 
MNS characterized by long-term contracts ... Core" Network Architecture 
The current wire-line "Core" network consists of parallel PSTN, SMDS, 
ATM, Frame -Relay, B/PRI and LP networks. The PSTN network has been 
evolving over the last . . . 3/STM-l) . 



ryi 

The data networks consist of many technologies e.g. SMDS, ATM, frame 
-relay and IP. 

In some cases, these data networks themselves are parallel networks, in 
other. . . 

.share a common technology in the backbone (e.g. ATM can be the backbone 
for frame relay and IP data networks) . These data networks share the 
same SONET based backbone with... 
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...SPECIFICATION a third embodiment of the present invention; 

Fig. 5 (A) is a view exemplarily illustrating frame power Pi 
under noiseless environment; 

Fig . 5 (B) is a view exemplarily illustrating frame power Pi 1 
under noisy environment due to automobiles; 

Fig. 6 (A) to (c) are respectively views illustrating evaluation 
of a local peaks vector; 

Fig. 7 is a view illustrating evaluation for pattern similarity... 

...Fig. 9 is a view illustrating linear expansion of a time axis of an 
input pattern ; 

Fig. 10 is a flowchart . illustrating evaluation for a local peaks 
vector in the third and fourth embodiments of the present invention; 
and 

Figs. 11(A) to (E) are respectively views illustrating the 
evaluation for...m designates the number of a reference pattern while j 
the number of a speech frame . For simplifying the description, a start 

point of the learning speech yielded in the voiced interval detection 

(S13) is assumed to be 1 while an end point... 



.preparing the reference pattern as described above was designated as the 
registration processing, processing of recognizing an input speech is 
designated as recognition processing. Thereupon, any speech input in 
the recognition processing is called an input speech. For this input 
speech, the numbers of speech frames... the first through fourth 
embodiments described previously was performed by the nonlinear matching 
using the dynamic programming method, but this evaluation can be 
performed by means of the linear matching. When performing... 
.input speech is subjected to time axis linear expansion or compression 
into a prescribed speech frames length. 
"Linear expansion or compression " 

The linear expansion or compression processing is mainly to 
facilitate linear matching described later and is furthermore to 
facilitate area control upon. . . 

.time axis linear expansion will be described. A case is described for 
brevity where input local peaks vectors are linearly expanded into 
thirty two speech frames . The start point and the end point are 
respectively assumed to be S and E. . . 

.speech frame number after the linear expansion is assumed to be i' (i 1 = 
1 through 32 ) , and a speech frame number before the linear 
expansion or compression is assumed to be i. The speech frame... 

.19): (see image in original document) 

and an input local peaks vector Ri in the ith speech frame before 
the linear expansion or compression is assumed to be an input local peaks 
...speech feature vectors shown in the second and fourth embodiments 
described previously, whereby highly accurate speech recognition can 
be assured. Furthermore, as described in the third embodiment ,* .according 
to the present invention, local peaks vectors are evaluated by 
estimating window vectors from feature vectors after spectrum 
normalization of an input speech, smoothing the window vectors, and 
multiplying the feature vector after... 

.spectrum window. Accordingly no erroneous descrimination is produced 
between a local peak due to noises and that due to a speech for 
assuring highly accurate processings in the similarity evaluation with 
each reference pattern... 

. j udgement thereof . 

As clearly evidenced from the above description, according to the 
present invention, a speech recognition system with excellent 
recognition accuracy can be assured. 

From the foregoing, it will now be apparent that new and improved 
speech recognition method and system have been found. It should be 
understood of course that the embodiments... 

. CLAIMS name given to a reference pattern having a maximum pattern 
similarity among said pattern similarities evaluated for each 
reference pattern as a recognized result (S19) ; 

characterized by said step (c) for converting to said local 
peaks vector Ri comprising the steps of : 

(A) evaluating a speech feature vector Bi by subtracting said noise 
pattern N from each feature vector... 

.evaluating a spectrum normalized speech feature vector Zi composed of 



components z( sub(i)}{ sup( k ), where i is a subscript indicative 
of a speech frame number and k is a superscript indicative of the 
channel number k = 1 to K, by spectrum normalizing said speech 
feature vector Bi using said least... 
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speech recognition system are slightly different from a system that 
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...therefore an approximation. The optimization in the selection algorithm 

(see Selection of periodic signal contributions) determines the final 
instantaneous fundamental period. 

In each frame the three highest peaks in the summed autocorrelation 
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... extracted and compared with previously trained models. Those 

features are computed from regular intervals, or frames , of speech using 
a technique called "windowing." Each frame is reduced to around two dozen 
parameters. Examples of speech features include linear prediction 
coefficients , melwarped cepstral coefficients and fundamental 
frequency. 

If we record speech at 11 kHz and use a 10-ms. . . 



12/3, K/2 (Item 1 from file: 88) 

DIALOG (R) File 88:Gale Group Business A.R.T.S. 
(c) 2003 The Gale Group. All rts. reserv. 

03059250 SUPPLIER NUMBER: 14134739 

Speaker identification based on a matrix quantization method. 

Chen, Ming-Shin; Lin, Pei-Hwa; Wang, Hsiao-Chuan 

IEEE Transactions on Signal Processing, v41, nl, p398(6) 

Jan, 1993 

ISSN: 1053-587X LANGUAGE: English RECORD TYPE: Abstract 
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